Biological Pattern Discovery with R Machine Learning Approaches (Zheng Rong Yang)

d of the estimated cluster centres for this data set. Table 2.8 shows

o estimated cluster centres for the data set. Compared with the

stribution shown in Figure 2.24, it can be seen that these cluster

ere correctly estimated.

Applying the K-means algorithm to a data set of two clusters. The dots stand for

d the big crosses stand for the cluster centres estimated by the kmeans function.

The cluster centres found by the kmeans function for the data in Figure 2.24.

4.226

9.179

5.826

2.892

utput named as cluster of the kmeansfunction will show

points are clustered. Suppose a K-means model was constructed

kmeans function for the 20 amino acids and three clusters were

Table 2.9 shows how amino acids were clustered by the K-

odel. It can be seen that the amino acids A, G, H, Q, S and T were

together. Other amino acids were grouped into other two clusters.

ierarchical clustering algorithm and the K-means algorithm are

cal algorithms based on different clustering strategies. It is

g to compare these two algorithms for the amino acid data set.

purpose, the first two principal components of a principal

nt analysis (PCA) [Pearson, 1901] model (the R function

) was used to generate a two-dimensional mapping space for

ng these two cluster models. The hierarchical cluster model and

ans cluster model were found having the same result as seen in

map as shown in Figure 2.25.